Skip to content

feat(ci): token-limits gate (tiktoken, no key) + byte/token partition#39

Merged
JacobPEvans-personal merged 3 commits into
mainfrom
feat/token-limits-gate
Jun 20, 2026
Merged

feat(ci): token-limits gate (tiktoken, no key) + byte/token partition#39
JacobPEvans-personal merged 3 commits into
mainfrom
feat/token-limits-gate

Conversation

@JacobPEvans-personal

Copy link
Copy Markdown
Member

Summary

Finishes the long-started transition from a raw byte file-size gate to a token-based limit for AI-read docs — using a no-auth, offline tokenizer (tiktoken) instead of atc (which needs ANTHROPIC_API_KEY). Both gates now coexist with mutually-exclusive coverage: token budgets for Markdown docs, byte limits for everything else.

Changes

  • _token-limits.yml (new, reusable): counts tokens with public/offline tiktoken (pip install tiktoken, no secret), per-repo config in .token-limits.yaml; sparse-checkouts the shared counter (same pattern as the watchdog).
  • scripts/check-token-limits.py (new): fnmatch budgets (most-restrictive match wins), exclude globs, skips non-token-gated + binary files; exit 1 on violation. A file is token-gated iff it matches a limits pattern.
  • _file-size.yml: when a .token-limits.yaml is present, drops .md from its scan so Markdown docs are governed only by the token gate — no file is double-gated. Repos without .token-limits.yaml are unaffected.
  • _ci-gate.yml: new token_limits input + Token Limits job wired into the Merge Gate needs/allowed-skips.

Config shape

# .token-limits.yaml
defaults: { max_tokens: 2000 }
exclude: ['TERRAFORM.md']      # machine-generated reference: neither gate
limits:
  AGENTS.md: 2000
  '*README.md': 1500
  '*.md': 2000                  # catch-all so all .md are budgeted

Test plan

  • actionlint clean on all three workflows (shellcheck included — no SC2053/SC2254 suppressions; the partition uses an extension drop, not dynamic globs).
  • Counter verified locally with real tiktoken (o200k_base): correct per-pattern budgets, exclude honored, non-.md skipped, exit 1 on over-budget.
  • No secrets required anywhere.

First consumer: dryvist/tofu-unifi (follow-up PR) flips file_size-only → file_size + token_limits.

…tion

New reusable _token-limits.yml budgets AI-read Markdown docs via the public,
offline tiktoken tokenizer (no ANTHROPIC_API_KEY); per-repo config in
.token-limits.yaml. _ci-gate.yml gains a token_limits input + a Token Limits
job wired into the Merge Gate. _file-size.yml drops .md from its scan when a
.token-limits.yaml is present, so each file is governed by exactly one gate
(token budgets for docs, byte limits for everything else).

- scripts/check-token-limits.py: fnmatch budgets (most-restrictive match wins),
  exclude globs, skips non-token-gated + binary files; exit 1 on violation
- counter verified locally with tiktoken (o200k_base); actionlint clean

Assisted-by: Claude:claude-opus-4-8[1m]

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new offline token-limit checker script (scripts/check-token-limits.py) that uses tiktoken to enforce token budgets defined in .token-limits.yaml. Feedback was provided to make the YAML parsing logic more robust by explicitly validating the types of the parsed configuration elements, preventing potential CI crashes if the configuration file is malformed.

Comment thread scripts/check-token-limits.py Outdated
…formed)

Per review on #39 (gemini-code-assist): a malformed config (e.g. limits as a
list, or non-int budgets) previously crashed the run with AttributeError/
TypeError. Now type-check parsed limits/exclude/defaults and degrade to
'nothing token-gated' instead of failing CI.

Assisted-by: Claude:claude-opus-4-8[1m]
Per review direction: prefer trusted community tooling, keep custom code to the
bare minimum. Drops the 105-line script to ~28 lines of logic using the public,
offline tiktoken tokenizer directly (no API key, no unvetted third-party tool).
Still: first-match glob budgets, exclude, malformed-config guard.

Assisted-by: Claude:claude-opus-4-8[1m]
@JacobPEvans-personal JacobPEvans-personal merged commit 34f34ce into main Jun 20, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant